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Central Limit Theorem 


7.1.2 Central Limit Theorem 

The central limit theorem (CLT) is one of the most important results in probability theory. It states that, under certain conditions, the sum of a 
large number of random variables is approximately normal. Here, we state a version of the CLT that applies to i.i.d. random variables. Suppose 
thatXj, X 2 ,..., X n are i.i.d. random variables with expected values EX. = p < 00 and variance Var(X) = <r 2 < oo. Then as we saw above, the 
x x +x 2 +...+x h a 2 

sample mean X = ---has mean EX = p and variance Var(A) = —. Thus, the normalized random variable 


X-ju X { +X 2 + ... + X n ~np 
er/'sjn y jncr 

has mean EZ n = 0 and variance Var(Z w ) = 1. The central limit theorem states that the CDF of Z n converges to the standard normal CDF. 

The Central Limit Theorem (CLT) 

LetXj^v^n i.i.d. random variables with expected value EX. = p < <x> and variance 0 < Var(X) = a 2 < oo. Then, the random variable 


z w 


X-li 
a/-\[n 


X\ X 2 + ... + X n — np 
y jncr 


converges in distribution to the standard normal random variable as n goes to infinity, that is 

lim P(Z n <x) = <J>(r), for all x E R, 

n —>00 


where <D(x) is the standard normal CDF. 

An interesting thing about the CLT is that it does not matter what the distribution of the X's is. The Xjs can be discrete, continuous, or mixed 
random variables. To get a feeling for the CLT, let us look at some examples. Let's assume that X's are Bernoulli(p). Then EX. = p, 

Var(X) = p( 1 — p). Also, Y n = X^ + + ... + X n has Binomial(n, p) distribution. Thus, 

Y n - np 

Z n = , " 

w( 1 -p) 


where Y n ~ Binomial(n, p). Figure 7.1 shows the PMF of Z n for different values of n. As you see, the shape of the PMF gets closer to a normal 
PDF curve as n increases. Here, Z n is a discrete random variable, so mathematically speaking it has a PMF not a PDF. That is why the CLT 
states that the CDF (not the PDF) of Z n converges to the standard normal CDF. Nevertheless, since PMF and PDF are conceptually similar, the 
figure is useful in visualizing the convergence to normal distribution. 
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Assumptions: 

• Xi, X2 ... are iid Bernoulli(p). 

_ X\ + X2 + • • • + X n — np 
yjnp(l-p) 

We choose p = |. 
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Fig.7.1 - Z is the normalized sum of n independent Bernoulli(p) random variables. The shape of its PMF, P z (z), resembles the 

normal curve as n increases. 
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1 1 

As another example, let's assume thatX's are Uniform^ 0, 1). Then EX. = Var(X) = ~. In this case, 


Xl+X 2 + ... +X ll -- 2 


a/h/12 


Figure 7.2 shows the PDF of Z n for different values of n. As you see, the shape of the PDF gets closer to the normal PDF as n increases. 



Assumptions: 

• Xi, X2 ... are iid Uniform( 0 ,l) 
X x + X 2 + ... + X n - § 


Z n — 


/ n 

V 12 
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Fig. 7.2 - Z n is the normalized sum of n independent Uniform^ 0, 1) random variables. The shape of its PDF,/ z (z), gets closer to the 

normal curve as n increases. 


We could have directly looked at Y n = + X 2 + •.. + X n , so why do we normalize it first and say that the normalized version ( Z n ) becomes 

approximately normal? This is because EY’ = nEX. and Var(T w ) = ncr 2 go to infinity as n goes to infinity. We normalize Y n in order to have a 
finite mean and variance (EZ n = 0, Var(Z w ) =1). Nevertheless, for any fixed n, the CDF of Z n is obtained by scaling and shifting the CDF of Y 
. Thus, the two CDFs have similar shapes. 

The importance of the central limit theorem stems from the fact that, in many real applications, a certain random variable of interest is a sum of 
a large number of independent random variables. In these situations, we are often able to use the CLT to justify using the normal distribution. 
Examples of such random variables are found in almost every discipline. Here are a few: 

• Laboratory measurement errors are usually modeled by normal random variables. 

• In communication and signal processing, Gaussian noise is the most frequently used model for noise. 

• In finance, the percentage changes in the prices of some assets are sometimes modeled by normal random variables. 

• When we do random sampling from a population to obtain statistical knowledge about the population, we often model the resulting 
quantity as a normal random variable. 

The CLT is also very useful in the sense that it can simplify our computations significantly. If you have a problem in which you are interested 
in a sum of one thousand i.i.d. random variables, it might be extremely difficult, if not impossible, to find the distribution of the sum by direct 
calculation. Using the CLT we can immediately write the distribution, if we know the mean and variance of the X.'s. 

Another question that comes to mind is how large n should be so that we can use the normal approximation. The answer generally depends on 
the distribution of the Xs. Nevertheless, as a rule of thumb it is often stated that if n is larger than or equal to 30, then the normal 
approximation is very good. 

Let's summarize how we use the CLT to solve problems: 

How to Apply The Central Limit Theorem (CLT) 

Here are the steps that we need in order to apply the CLT: 

1. Write the random variable of interest, Y, as the sum of n i.i.d. random variable X.'s: 

y = x x +x 2 + ... +x n . 


2. Find EY and Var(y) by noting that 


where pi = EX. and o 1 = Var(X). 


EY=npi , Var (Y) = na 2 , 
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3. According to the CLT, conclude that 


Y-EY 

^/Var(Y) 


Y-n f i 


' is approximately standard normal; thus, to find P(y j < Y < y 2 ), we can write 


P(y i — Y < y 2 ) = P 


y\~np Y-nn yi~ n P 

^na -\jno -\fna 


y 2 - n/i 


- <t> 


y i ~ m 


Let us look at some examples to see how we can use the central limit theorem. 


Example 7.1 

A bank teller serves customers standing in the queue one by one. Suppose that the service time X for customer i has mean £X = 2 (minutes) 
and Var(X) = 1. We assume that service times for different bank customers are independent. Let Y be the total time the bank teller spends 
serving 50 customers. Find.P(90 < Y < 110). 

• Solution 

o Y = X 1 +X 2 + ... +X n , 

where n = 50, EX. = fi = 2, and Var(X) = a 1 = 1. Thus, we can write 


90 — nu Y — nn 110 - nu 
P(90 < Y< 110) = p \— 


y/nrr y/n 


y/« 


90 - 100 Y-nfi 110-100 
^50 y/ no y /50 


= W-V2 


Y — n/j. 

< — 1 =— < 


V2 


Y—n/i 

By the CLT, —— is approximately standard normal, so we can write 

•yncr 


P (90 < Y< 110) = <t(y/2) - ®( - a/2) 
= 0.8427 


Example 7.2 

In a communication system each data packet consists of 1000 bits. Due to the noise, each bit may be received in error with probability 0.1. It is 
assumed bit errors occur independently. Find the probability that there are more than 120 errors in a certain data packet. 

• Solution 

o Let us define X. as the indicator random variable for the zth bit in the packet. That is, X = 1 if the zth bit is received in error, and 
X. = 0 otherwise. Then theX's are i.i.d. andX ~ Bernoulliip = 0.1). If Tis the total number of bit errors in the packet, we have 

y=x 1 +x 2 + ... +x w . 


Since X. ~ Bernoulli(p = 0.1), we have 

EX l = fi=p = 0.1, Var(X) = a 1 = p( 1 - p) = 0.09 

Using the CLT, we have 
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P(Y> 120) = P 


Y-nfi 120 - nil 
—— > -==- 


Y-nfi 120-100 

> 




V90 


/ 20 


= 0.0175 


Continuity Correction: 

1 

Let us assume that Y ~ Binomialin = 20, p = -), and suppose that we are interested in P( 8 < 7 < 10). We know that a 
1 

Binomialin = 20, p = -) can be written as the sum of n i.i.d. Bernoulliip ) random variables: 

7 = +X 2 + ... +X w . 

1 

Since X ~ Bernoulliip = ^we have 

1 2 1 

EX t = n= p = Var(X ; ) = a = p{\ - p) = ~. 

Thus, we may want to apply the CLT to write 


P(8 < 7< 10) 


( 8 — nju 

y Ina 



< 


Y — nfi 
y jno 

Y — n/i 
y Ino 


\0 — nju 
yjno 

10- 10\ 

^rj 


a <D(0) - <D ( -p I 

\ Vs ) 

= 0.3145 


Since, here, n = 20 is relatively small, we can actually fmdP(8 < Y< 10) accurately. We have 

10 

P(8<7<10) = X (”)?*(! ~pT~ k 

■[(T)*(?) + (”)](!)- 

= 0.4565 

We notice that our approximation is not so good. Part of the error is due to the fact that 7 is a discrete random variable and we are using a 
continuous distribution to find P( 8 < Y < 10). Here is a trick to get a better approximation, called continuity correction. Since Y can only take 
integer values, we can write 
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P(8 < Y< 10) 


P(7.5 <Y< 10.5) 
P 


7.5 — nft Y—nfi 
-;= < ;=— 


7.5-10 Y-n f i 
^5 ^no 


10.5 — n[i\ 

^nry J 

10.5 - 10\ 



= 0.4567 



As we see, using continuity correction, our approximation improved significantly. The continuity correction is particularly useful when we 
would like to find P(y^ < Y < y 2 ), where Y is binomial and andy 2 are close to each other. 


Continuity Correction for Discrete Random Variables 
Let X^ 2 -> '"yX n be independent discrete random variables and let 

Y = X l +X 2 + -+X n . 

Suppose that we are interested in finding P(A) = P(l < Y < u) using the CLT, where / and u are integers. Since Y is an integer-valued random 
variable, we can write 


P(A) = P(l- l -<Y<u+ l ~). 

It turns out that the above expression sometimes provides a better approximation for P(A) when applying the CLT. This is called the continuity 
correction and it is particularly useful when X's are Bernoulli (i.e., Y is binomial). 
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